Summary of Pearl 2014
Two statisticians are interested in determining whether there are differences among males and females in weight gain over the course of semester. In the original depiction by Lord, there is an implication of some diet treatment, but all that can be assumed is that those under consideration both received the ‘treatment’ if there was one. The two statisticians take different approaches to examining the data, yet come to different conclusions.
The following graph is from Pearl 2014. The ellipses represent the scatterplots for boys and girls. The diagonal 45o degree line would represent no change from time 1 to time 2. The center of the ellipses are both on this line, and thus the mean change for boys and girls are identical and zero. The density plot in the lower left depicts the distribution of change scores centered on this zero estimate.
Statistician 1 focuses on change scores, while statistician 2 uses an ancova to examine sex differences at time 2 while adjusting for initial weight.
The model can be depicted as a directed acyclic graph as follows.
In the above we can define the following effects of sex on weight:
Gain is completely determined as the difference between final weight and initial weight.
In summary, the two statisticians are focused on different effects from the same model.
We can get an explicit sense of the results by means of a hands on example. In the following we have simulated data that will reproduce the situation described thus far. Parameters were chosen for visual effect. One thing that’s not been noted about the example is that it likely would not occur. The total effect by definition would be larger than the direct effect as there would be strong sex differences at initial weight, and a strong correlation between initial and final weight, all in a positive manner. The other issue unadressed is that the entire focus is on whether an effect exists hinges on p-values, and with large enough data and such simple models, anything would flag significant. Plus there are issues of variance, nonlinear relations etc.
library(dplyr)
set.seed(1234)
N = 200
group = rep(c(0, 1), e=N/2)
pre = .75*group + rnorm(N, sd=.25)
post = .4*pre + .5*group + rnorm(N, sd=.1)
change = post-pre
df = data.frame(id=factor(1:N), group=factor(group, labels=c('Female', 'Male')), pre, post, change)
dflong = tidyr::gather(df, key=time, value=score, pre:post) %>% arrange(id)
head(df)
id group pre post change
1 1 Female -0.30176644 -0.07218389 0.2295825445
2 2 Female 0.06935731 0.09741980 0.0280624915
3 3 Female 0.27111029 0.12699551 -0.1441147849
4 4 Female -0.58642443 -0.16449642 0.4219280069
5 5 Female 0.10728117 0.07408057 -0.0332006005
6 6 Female 0.12651397 0.12665183 0.0001378524
head(dflong)
id group change time score
1 1 Female 0.22958254 pre -0.30176644
2 1 Female 0.22958254 post -0.07218389
3 2 Female 0.02806249 pre 0.06935731
4 2 Female 0.02806249 post 0.09741980
5 3 Female -0.14411478 pre 0.27111029
6 3 Female -0.14411478 post 0.12699551
In the following we’ll use lavaan to estimate the full mediation model, then run separate regressions to demonstrate the t-test on change vs. the ancova approach. As noted above, the t-test on change score measures the total effect of sex on final weight, while the ancova measures the direct effect. It is unnecessary to distinguish them as separate modeling approaches, as they are merely regressions with different target variables.
mod = "
pre ~ a*group
post ~ b*group + c*pre
# change ~ -1*pre + 1*post
# total effect
TE := (b + a*c) - a
"
library(lavaan)
lpmod = sem(mod, data=df)
summary(lpmod)
lavaan (0.5-20) converged normally after 28 iterations
Number of observations 200
Estimator ML
Minimum Function Test Statistic 0.000
Degrees of freedom 0
Minimum Function Value 0.0000000000000
Parameter Estimates:
Information Expected
Standard Errors Standard
Regressions:
Estimate Std.Err Z-value P(>|z|)
pre ~
group (a) 0.800 0.036 22.317 0.000
post ~
group (b) 0.447 0.026 17.028 0.000
pre (c) 0.445 0.028 16.043 0.000
Variances:
Estimate Std.Err Z-value P(>|z|)
pre 0.064 0.006 10.000 0.000
post 0.010 0.001 10.000 0.000
Defined Parameters:
Estimate Std.Err Z-value P(>|z|)
TE 0.004 0.024 0.165 0.869
summary(lm(change~group, df)) # t-test on change scores = total effect
Call:
lm(formula = change ~ group, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.5174 -0.1133 0.0208 0.1310 0.4612
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.038975 0.017300 2.253 0.0254 *
groupMale 0.004028 0.024466 0.165 0.8694
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.173 on 198 degrees of freedom
Multiple R-squared: 0.0001369, Adjusted R-squared: -0.004913
F-statistic: 0.02711 on 1 and 198 DF, p-value: 0.8694
summary(lm(post~group+pre, df)) # 'ancova' uncovers direct effect etc.
Call:
lm(formula = post ~ group + pre, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.315819 -0.066341 0.005835 0.060977 0.268154
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.01724 0.01008 1.71 0.0888 .
groupMale 0.44744 0.02648 16.90 <2e-16 ***
pre 0.44538 0.02797 15.92 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1002 on 197 degrees of freedom
Multiple R-squared: 0.9463, Adjusted R-squared: 0.9457
F-statistic: 1734 on 2 and 197 DF, p-value: < 2.2e-16
Use change score adjusting for initial weight. Note that this duplicates the ancova result for the group effect (i.e. it is the direct effect). But note that the coefficient for initial weight is equivalent to the ancova coefficient -1. See for example, Senn 2006.
Call:
lm(formula = change ~ group + pre, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.315819 -0.066341 0.005835 0.060977 0.268154
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.01724 0.01008 1.71 0.0888 .
groupMale 0.44744 0.02648 16.90 <2e-16 ***
pre -0.55462 0.02797 -19.83 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1002 on 197 degrees of freedom
Multiple R-squared: 0.6662, Adjusted R-squared: 0.6628
F-statistic: 196.6 on 2 and 197 DF, p-value: < 2.2e-16
Wainer & Brown 2007 took a different interpretation of the paradox
*The choice of comic sans font in the graph due to Wainer and Brown and should be held against them.
- low birthweight children have higher mortality rate (100 fold higher)
- children of smoking mothers notably more likely to have low birghtweight
- low birthweight children born to smoking mothers have a lower mortality rate
- Conclusion: expectant mothers should start smoking!
Pearl 2013